Skip to content

Use matching MPI datatype for size_t reduction#142

Merged
romerojosh merged 2 commits into
NVIDIA:mainfrom
fallintoplace:fix/nvshmem-size-t-allreduce
Jun 22, 2026
Merged

Use matching MPI datatype for size_t reduction#142
romerojosh merged 2 commits into
NVIDIA:mainfrom
fallintoplace:fix/nvshmem-size-t-allreduce

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

Fix the NVSHMEM allocation-size reduction in cudecompMalloc to use an MPI datatype that matches the actual size_t typedef.

The previous code reduced buffer_size_bytes directly with MPI_LONG_LONG_INT. That is ABI-sensitive because buffer_size_bytes is size_t, which may be unsigned long, unsigned long long, or another unsigned integer type depending on the platform.

This PR adds a small internal helper that maps size_t to the matching unsigned MPI datatype before the MPI_Allreduce. It also updates the adjacent warning message to print buffer_size_bytes with %zu.

Tests

  • git diff --check
  • Inspected remaining MPI_LONG_LONG_INT uses; the ones in autotune.cc reduce int64_t values, not size_t

Local formatter/build/test execution was not available in this checkout because clang-format, nvcc, and mpicxx are not installed.

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@romerojosh

Copy link
Copy Markdown
Collaborator

/build

@github-actions

Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions

Copy link
Copy Markdown

✅ Build workflow passed! View run

@romerojosh romerojosh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for another contributinon @fallintoplace! Changes look good to me but there are two other locations that use the old MPI_LONG_LONG_INT that should be updated to use this new utility function before landing.
https://github.com/NVIDIA/cuDecomp/blob/main/src/autotune.cc#L267
https://github.com/NVIDIA/cuDecomp/blob/main/src/autotune.cc#L723

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@romerojosh romerojosh merged commit bec5142 into NVIDIA:main Jun 22, 2026
4 checks passed
@romerojosh

Copy link
Copy Markdown
Collaborator

LGTM! Thanks for the contribution @fallintoplace!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants